Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and throw error when model output or model metadata files are deleted/modified. #63

Merged
merged 3 commits into from
Dec 6, 2023

Conversation

annakrystalli
Copy link
Member

@annakrystalli annakrystalli commented Nov 28, 2023

PR adds functionality tovalidate_pr() to check for deletions of previously submitted model metadata files and modifications or deletions of previously submitted model output files, adding an <error/check_error> class object to the function output for each detected modified/deleted file

Testing uses the following PR: hubverse-org/ci-testhub-simple#6

@annakrystalli annakrystalli requested a review from elray1 November 28, 2023 12:07
@annakrystalli
Copy link
Member Author

Demo of functionality including modification & deletion of model output files + deletion of model metadata file in a PR

library(hubValidations)
temp_hub <- fs::path(tempdir(), "mod_del_hub")
gert::git_clone(
    url = "https://github.com/Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
    path = temp_hub,
    branch = "test-mod-del"
)


validate_pr(
    hub_path = temp_hub,
    gh_repo = "Infectious-Disease-Modeling-Hubs/ci-testhub-simple",
    pr_number = 6,
    skip_submit_window_check = TRUE
)
#> ℹ PR contains commits to additional files which have not been checked:
#> • ".github/workflows/validate_submission.yaml"
#> • "README.md"
#> • "model-metadata/README.md"
#> • "model-output/hub-baseline/README.txt"
#> • "random-file.txt"
#> ✔ mod_del_hub: All hub config files are valid.
#> ✖ 2022-10-08-hub-baseline.csv: Previously submitted model output files must not
#>   be modified.  'model-output/hub-baseline/2022-10-08-hub-baseline.csv'
#>   modified.
#> ✖ 2022-10-15-team1-goodmodel.csv: Previously submitted model output files must
#>   not be removed.
#>   'model-output/team1-goodmodel/2022-10-15-team1-goodmodel.csv' removed.
#> ✖ team1-goodmodel.yaml: Previously submitted model metadata files must not be
#>   removed.  'model-metadata/team1-goodmodel.yaml' removed.
#> ✔ 2022-10-08-hub-baseline.csv: File exists at path
#>   'model-output/hub-baseline/2022-10-08-hub-baseline.csv'.
#> ✔ 2022-10-08-hub-baseline.csv: File name "2022-10-08-hub-baseline.csv" is
#>   valid.
#> ✔ 2022-10-08-hub-baseline.csv: File directory name matches `model_id` metadata
#>   in file name.
#> ✔ 2022-10-08-hub-baseline.csv: `round_id` is valid.
#> ✔ 2022-10-08-hub-baseline.csv: File is accepted hub format.
#> ✔ 2022-10-08-hub-baseline.csv: Metadata file exists at path
#>   'model-metadata/hub-baseline.yml'.
#> ✔ 2022-10-08-hub-baseline.csv: File could be read successfully.
#> ✔ 2022-10-08-hub-baseline.csv: `round_id_col` name is valid.
#> ✔ 2022-10-08-hub-baseline.csv: `round_id` column "origin_date" contains a
#>   single, unique round ID value.
#> ✔ 2022-10-08-hub-baseline.csv: All `round_id_col` "origin_date" values match
#>   submission `round_id` from file name.
#> ✔ 2022-10-08-hub-baseline.csv: Column names are consistent with expected round
#>   task IDs and std column names.
#> ✔ 2022-10-08-hub-baseline.csv: Column data types match hub schema.
#> ✔ 2022-10-08-hub-baseline.csv: `tbl` contains valid values/value combinations.
#> ✔ 2022-10-08-hub-baseline.csv: All combinations of task ID
#>   column/`output_type`/`output_type_id` values are unique.
#> ✔ 2022-10-08-hub-baseline.csv: Required task ID/output type/output type ID
#>   combinations all present.
#> ✔ 2022-10-08-hub-baseline.csv: Values in column `value` all valid with respect
#>   to modeling task config.
#> ✔ 2022-10-08-hub-baseline.csv: Values in `value` column are non-decreasing as
#>   output_type_ids increase for all unique task ID value/output type
#>   combinations of quantile or cdf output types.
#> ℹ 2022-10-08-hub-baseline.csv: No pmf output types to check for sum of 1. Check
#>   skipped.
#> ✔ 2022-10-22-team1-goodmodel.csv: File exists at path
#>   'model-output/team1-goodmodel/2022-10-22-team1-goodmodel.csv'.
#> ✔ 2022-10-22-team1-goodmodel.csv: File name "2022-10-22-team1-goodmodel.csv" is
#>   valid.
#> ✔ 2022-10-22-team1-goodmodel.csv: File directory name matches `model_id`
#>   metadata in file name.
#> ✔ 2022-10-22-team1-goodmodel.csv: `round_id` is valid.
#> ✔ 2022-10-22-team1-goodmodel.csv: File is accepted hub format.
#> ✖ 2022-10-22-team1-goodmodel.csv: Metadata file does not exist at path
#>   'model-metadata/team1-goodmodel.yml' or
#>   'model-metadata/team1-goodmodel.yaml'.

Created on 2023-11-28 with reprex v2.0.2

Copy link
Contributor

@LucieContamin LucieContamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running the code locally, I have some questions. Some of the tests are falling locally might be because of a version of the rlang package maybe:

── Failure ('test-validate_model_file.R:58:3'): validate_model_file print method work [fancy] ──
Snapshot of code has changed:
old[1:6] vs new[1:6]
  Code
    validate_model_file(hub_path, file_path = "team1-goodmodel/2022-10-15-hub-baseline.csv")
- Message
+ Message <rlang_message>
    ✔ 2022-10-15-hub-baseline.csv: File exists at path model-output/team1-goodmodel/2022-10-15-hub-baseline.csv.
    ✔ 2022-10-15-hub-baseline.csv: File name "2022-10-15-hub-baseline.csv" is valid.
    ! 2022-10-15-hub-baseline.csv: File directory name must match `model_id` metadata in file name.  File should be submitted to directory "hub-baseline" not "team1-goodmodel"

* Run `testthat::snapshot_accept('validate_model_file')` to accept the change.
* Run `testthat::snapshot_review('validate_model_file')` to interactively review the change.

I don't think it blocks the PR but wanted to warn you!

@@ -1,3 +1,7 @@
# hubValidations 0.0.0.9006

* `validate_pr()` now checks for deletions of previously submitted model metadata files and modifications or deletions of previously submitted model output files, adding an `<error/check_error>` class object to the function output for each detected modified/deleted file (#43 & #44).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some thinking, I wonder if it makes sense to have the possibility (maybe in a future version) to set this check to returns a warning instead of an error. I am particularly thinking about the Scenario projection rounds, where during an ongoing round, a team might want to submit an update of their projections.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So a warning would still get flagged and fail the workflow. The error just made it a bit more visually obvious when printing the results.

Do you want it to be able to pass validation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, I guess if it's in the ongoing round, it will make sense for us to have it pass the validation, can it be a parameter the hub set? so that depending on the hub they can decide if they want this type of issue to fail or pass the validation (with just a warning/message)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be something we can introduce indeed. Fancy adding an issue with how you envisage it best working for you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes will do that! thanks

@annakrystalli
Copy link
Member Author

After running the code locally, I have some questions. Some of the tests are falling locally might be because of a version of the rlang package maybe:

Indeed it is to do with the version of rlang. I believe if you update your rlang version that should go away.

@annakrystalli
Copy link
Member Author

Actually it's not the rlang version that is the issue. I think it might be a testthat version issue.

See for example this commit I had to make to update broken tests due to new testthat version. hubverse-org/hubUtils@8be9dbc

@LucieContamin
Copy link
Contributor

Actually it's not the rlang version that is the issue. I think it might be a testthat version issue.

See for example this commit I had to make to update broken tests due to new testthat version. Infectious-Disease-Modeling-Hubs/hubUtils@8be9dbc

Thanks, I miss the version information on the DESCRIPTION file, sorry about that! Updating testthat fixed the issue! THanks again

@annakrystalli
Copy link
Member Author

Yeyyy 🚀

Thanks again for your review @LucieContamin !

@annakrystalli annakrystalli merged commit 5e7ac71 into main Dec 6, 2023
10 checks passed
@annakrystalli annakrystalli deleted the mod-delete-alert branch December 6, 2023 14:58
@annakrystalli annakrystalli restored the mod-delete-alert branch December 19, 2023 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
2 participants